A Multistage Gene Normalization System Integrating Multiple Effective Methods
نویسندگان
چکیده
Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems.
منابع مشابه
An integrated CRISPR-Cas toolkit for engineering human cells
Natively functioning Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated (Cas) system is a prokaryotic adaptive immune system that confers resistance to foreign genetic elements including plasmids and phages. Very recently, a two-component CRISPR-Cas technology from Streptococcus Pyogenes comprising of the RNA-guided DNA endonuclease Cas9 and the guide RNA (gRNA) has b...
متن کاملGene expression profiling in the rhesus macaque: methodology, annotation and data interpretation.
Gene microarray analyses represent potentially effective means for high-throughput gene expression profiling in non-human primates. In the companion article, we emphasize effective experimental design based on the in vivo physiology of the rhesus macaque, whereas this article emphasizes considerations for gene annotation and data interpretation using gene microarray platforms from Affymetrix. I...
متن کاملNTTMUNSW BioC modules for recognizing and normalizing species and gene/protein mentions
In recent years, the number of published biomedical articles has increased as researchers have focused on biological domains to investigate the functions of biological objects, such as genes and proteins. However, the ambiguous nature of genes and their products have rendered the literature more complex for readers and curators of molecular interaction databases. To address this challenge, a no...
متن کاملParser Adaptation for Social Media by Integrating Normalization
Previous photos and videos This work explores normalization for parser adaptation. Traditionally, normalization is used as separate preprocessing step. We show that integrating the normalization model into the parsing algorithm is beneficial. To this end, we use a normalization model combined with the parsing as intersection algorithm. This way, multiple normalization candidates can be leverage...
متن کاملMultistage Ring Network: A New Multiple Ring Network for Large Scale Multiprocessors
We present a new multiple ring network for multiprocessors, called the Multistage Ring Network(MRN). The MRN has a 2-level hierarchy of register insertion rings, and its interconnection of global rings forms a type of the multistage network. The architecture of the MRN is effective at diffusing the global traffic to all global rings and the bandwidth of the network increases proportionally with...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 8 شماره
صفحات -
تاریخ انتشار 2013